Centrality

  • Definition of Centrality
  • Compare and contrast popular centrality measures on dataset
    • Degree
    • Closeness
    • Betweenness
    • Eigenvector


In [1]:
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
GA = nx.read_gexf('../data/ga_graph.gexf')

Degree Centrality

The degree of a node is the number of other nodes to which it is connected.

NetworkX's degree centrality is calculated by taking the degree of the node and dividing by n-1 where where n is the number of nodes in G.

$$ {C_D (u)} = \frac{deg(u)}{{n-1}} $$

⚠️ NOTE: In a directed graph, both in-degree and out-degree centrality can be calculated.

Let's find the degree of our main character Grey.


In [3]:
GA.degree("grey")


Out[3]:
4

Likewise, we can find the degree of each cast member.


In [4]:
# See all of them (ok on this small graph)
GA.degree()


Out[4]:
{'addison': 3,
 'adele': 1,
 'altman': 2,
 'arizona': 1,
 'avery': 1,
 'bailey': 2,
 'ben': 1,
 'chief': 2,
 'colin': 1,
 'denny': 1,
 'derek': 2,
 'ellis grey': 2,
 'finn': 1,
 'grey': 4,
 'hank': 1,
 'izzie': 4,
 'karev': 7,
 'kepner': 1,
 'lexi': 3,
 'mrs. seabury': 1,
 'nancy': 1,
 "o'malley": 4,
 'olivia': 2,
 'owen': 2,
 'preston': 1,
 'sloan': 5,
 'steve': 1,
 'susan grey': 1,
 'thatch grey': 2,
 'torres': 4,
 'tucker': 1,
 'yang': 3}

In [5]:
# Here's the top 5.
sorted(GA.degree().items(), key=lambda x:x[1], reverse=True)[:5]


Out[5]:
[('karev', 7), ('sloan', 5), ('grey', 4), ("o'malley", 4), ('izzie', 4)]

While knowing the raw number is great, most centrality measures are normalized between zero and one so that they can be more easily compared to one another.

For the degree centrality measure, the normalized interpretion is really intuitive:

What percentage of nodes is this node connected to?

Or for our Grey's Anatomy example:

What percentage of the cast has this character been invovled with?

Let's calculate the degree centrality for Grey.


In [6]:
# Degree for the 'Grey' node
degree_grey = GA.degree("grey")  # 4 romantic partners

# Total number of nodes (excluding Grey) 
total_nodes_minus_grey = len(GA.nodes())-1  # 31 characters in the cast, excluding Grey

# Degree centrality for Grey
degree_centrality_grey = (degree_grey / total_nodes_minus_grey)
print("Calculated degree centrality for Grey:", degree_centrality_grey)

# Double check
print("Networkx degree centrality for Grey:", nx.degree_centrality(GA)["grey"])

def check_equal(val1, val2):
    assert (val1 == val2),"Centrality measure calculated incorrectly!"
    return "Values match, good job!"

check_equal(degree_centrality_grey, nx.degree_centrality(GA)["grey"])


Calculated degree centrality for Grey: 0.12903225806451613
Networkx degree centrality for Grey: 0.12903225806451613
Out[6]:
'Values match, good job!'

Likewise, let's find the degree centrality for all characters.


In [7]:
degree_centrality = nx.degree_centrality(GA)
degree_centrality


Out[7]:
{'addison': 0.0967741935483871,
 'adele': 0.03225806451612903,
 'altman': 0.06451612903225806,
 'arizona': 0.03225806451612903,
 'avery': 0.03225806451612903,
 'bailey': 0.06451612903225806,
 'ben': 0.03225806451612903,
 'chief': 0.06451612903225806,
 'colin': 0.03225806451612903,
 'denny': 0.03225806451612903,
 'derek': 0.06451612903225806,
 'ellis grey': 0.06451612903225806,
 'finn': 0.03225806451612903,
 'grey': 0.12903225806451613,
 'hank': 0.03225806451612903,
 'izzie': 0.12903225806451613,
 'karev': 0.22580645161290322,
 'kepner': 0.03225806451612903,
 'lexi': 0.0967741935483871,
 'mrs. seabury': 0.03225806451612903,
 'nancy': 0.03225806451612903,
 "o'malley": 0.12903225806451613,
 'olivia': 0.06451612903225806,
 'owen': 0.06451612903225806,
 'preston': 0.03225806451612903,
 'sloan': 0.16129032258064516,
 'steve': 0.03225806451612903,
 'susan grey': 0.03225806451612903,
 'thatch grey': 0.06451612903225806,
 'torres': 0.12903225806451613,
 'tucker': 0.03225806451612903,
 'yang': 0.0967741935483871}

In [8]:
# Top 5.  Percent of cast this character has been with.
sorted(degree_centrality.items(), key=lambda x: x[1], reverse=True)[:5]


Out[8]:
[('karev', 0.22580645161290322),
 ('sloan', 0.16129032258064516),
 ('grey', 0.12903225806451613),
 ("o'malley", 0.12903225806451613),
 ('izzie', 0.12903225806451613)]

In [9]:
# Top 5.  Total # of partners this character has been with.
sorted(GA.degree().items(), key=lambda x: x[1], reverse=True)[:5]


Out[9]:
[('karev', 7), ('sloan', 5), ('grey', 4), ("o'malley", 4), ('izzie', 4)]

In [10]:
# apply measurements back to Graph
nx.set_node_attributes(GA, 'degree centrality', degree_centrality)

In [11]:
GA.node['karev']


Out[11]:
{'degree centrality': 0.22580645161290322, 'label': 'karev'}

Closeness Centrality

Closeness Centrality measures how many "hops" it would take to reach every other node in a network (taking the shortest path). It can be informally thought as 'average distance' to all other nodes.

In NetworkX, it the reciporical of of the average value, which normalizes the value in a 0 to 1 range.

$$ C_C (u) = \frac{n - 1}{\sum_{v=1}^{n-1} d(v, u)} $$

If you again take the reciporical of this, you'll find the average distance to all other nodes.

⚠️ NOTE: If the graph is not completely connected, this algorithm computes the closeness centrality for each connected part separately. The closeness centrality is normalized to (n-1)/(|G|-1) where n is the number of nodes in the connected part of graph containing the node. [Source]

Why should we care about closeness centrality?

Degree centrality measures might be criticized because they only take into account the immediate ties that an actor has, or the ties of the actor's neighbors, rather than indirect ties to all others. One actor might be tied to a large number of others, but those others might be rather disconnected from the network as a whole. In a case like this, the actor could be quite central, but only in a local neighborhood. [Source]

In our example, closeness centrality could perhaps help us understand which characters have the greatest potential to spread an infectous disease or STD across the cast (an admittedly dark plot arc).

Let's calculate the closeness centrality for Grey. First we'll start by getting the shortest paths between Grey and all other characters.


In [12]:
# Shortest path between Grey and other characters
grey_shortest_path = nx.shortest_path_length(GA)['grey']
grey_shortest_path


Out[12]:
{'addison': 2,
 'altman': 4,
 'arizona': 3,
 'avery': 5,
 'colin': 7,
 'denny': 3,
 'derek': 1,
 'finn': 1,
 'grey': 0,
 'hank': 3,
 'izzie': 2,
 'karev': 3,
 'kepner': 4,
 'lexi': 4,
 'mrs. seabury': 4,
 'nancy': 4,
 "o'malley": 1,
 'olivia': 2,
 'owen': 5,
 'preston': 7,
 'sloan': 3,
 'steve': 1,
 'torres': 2,
 'yang': 6}

In [13]:
# Sum of the shortest paths to all other characters
grey_sum_shortest_path = sum(grey_shortest_path.values())  # 77

# Closeness centrality for Grey
closeness_centrality_grey = (total_nodes_minus_grey / grey_sum_shortest_path)
print("Calculated closeness centrality for Grey:", closeness_centrality_grey)

# Double check
print("Networkx closeness centrality for Grey:", nx.closeness_centrality(GA)["grey"])

check_equal(closeness_centrality_grey, nx.closeness_centrality(GA)["grey"])


Calculated closeness centrality for Grey: 0.4025974025974026
Networkx closeness centrality for Grey: 0.2216170925848345
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-13-fc22a437e78e> in <module>()
      9 print("Networkx closeness centrality for Grey:", nx.closeness_centrality(GA)["grey"])
     10 
---> 11 check_equal(closeness_centrality_grey, nx.closeness_centrality(GA)["grey"])

<ipython-input-6-ebc10af03687> in check_equal(val1, val2)
     13 
     14 def check_equal(val1, val2):
---> 15     assert (val1 == val2),"Centrality measure calculated incorrectly!"
     16     return "Values match, good job!"
     17 

AssertionError: Centrality measure calculated incorrectly!

Interesting...our calculated measure is not the same as the one in NetworkX.

What happened here?

This error occured because the character relationship graph is not fully connected. (i.e., there are groups of characters that do not have relationships with one another).


In [14]:
# View members of different subgraphs
sorted(nx.connected_components(GA), key = len, reverse=True)


Out[14]:
[{'addison',
  'altman',
  'arizona',
  'avery',
  'colin',
  'denny',
  'derek',
  'finn',
  'grey',
  'hank',
  'izzie',
  'karev',
  'kepner',
  'lexi',
  'mrs. seabury',
  'nancy',
  "o'malley",
  'olivia',
  'owen',
  'preston',
  'sloan',
  'steve',
  'torres',
  'yang'},
 {'adele', 'chief', 'ellis grey', 'susan grey', 'thatch grey'},
 {'bailey', 'ben', 'tucker'}]

To correct for this, we will use the number of nodes in the Grey subgraph instead of the total number of nodes to calculated degree centrality. Additionally, we'll normalized to (n-1)/(|G|-1) where n is the number of nodes in the connected part of graph containing the node.


In [15]:
# Number of nodes in Grey subgraph, excluding Grey
total_nodes_minus_grey_sub = len(grey_shortest_path)-1   # 23

# Closeness centrality for Grey (unnormalized)
closeness_centrality_grey = (total_nodes_minus_grey_sub / grey_sum_shortest_path)   # ~0.2987

# Closeness centrality for Grey (normalized)
closeness_centrality_grey_normalized = closeness_centrality_grey * (total_nodes_minus_grey_sub/total_nodes_minus_grey)
print("Calculated closeness centrality for Grey (normalized):", closeness_centrality_grey_normalized)

# Double check
print("Networkx closeness centrality for Grey:", nx.closeness_centrality(GA)["grey"])

check_equal(closeness_centrality_grey_normalized, nx.closeness_centrality(GA)["grey"])


Calculated closeness centrality for Grey (normalized): 0.2216170925848345
Networkx closeness centrality for Grey: 0.2216170925848345
Out[15]:
'Values match, good job!'

Let's find the closeness centrality for all characters.


In [16]:
closeness_centrality = nx.closeness_centrality(GA)
closeness_centrality


Out[16]:
{'addison': 0.2892290869327502,
 'adele': 0.05161290322580645,
 'altman': 0.2337604949182501,
 'arizona': 0.21600653327888933,
 'avery': 0.19614386355209493,
 'bailey': 0.06451612903225806,
 'ben': 0.04301075268817204,
 'chief': 0.07373271889400922,
 'colin': 0.13228307076769194,
 'denny': 0.18752215526409075,
 'derek': 0.2337604949182501,
 'ellis grey': 0.08602150537634408,
 'finn': 0.17236884978820463,
 'grey': 0.2216170925848345,
 'hank': 0.18752215526409075,
 'izzie': 0.24731182795698925,
 'karev': 0.2892290869327502,
 'kepner': 0.21067303863002787,
 'lexi': 0.26253101736972706,
 'mrs. seabury': 0.21067303863002787,
 'nancy': 0.21067303863002787,
 "o'malley": 0.2708653353814644,
 'olivia': 0.2337604949182501,
 'owen': 0.19173613628126135,
 'preston': 0.13228307076769194,
 'sloan': 0.2892290869327502,
 'steve': 0.17236884978820463,
 'susan grey': 0.05161290322580645,
 'thatch grey': 0.07373271889400922,
 'torres': 0.29937747594793435,
 'tucker': 0.04301075268817204,
 'yang': 0.1594814591498342}

In [17]:
# top 5
sorted(closeness_centrality.items(), key=lambda x: x[1], reverse=True)[:5]


Out[17]:
[('torres', 0.29937747594793435),
 ('sloan', 0.2892290869327502),
 ('addison', 0.2892290869327502),
 ('karev', 0.2892290869327502),
 ("o'malley", 0.2708653353814644)]

In [18]:
# apply measurements back to Graph
nx.set_node_attributes(GA, 'closeness centrality', closeness_centrality)

In [19]:
# average distance of torres:
1 / closeness_centrality['torres']


Out[19]:
3.340264650283554

In [20]:
7/(len(GA.nodes()) - 1)


Out[20]:
0.22580645161290322

Betweeness Centrality

Betweenness centrality quantifies the number of times a node acts as a bridge (or "broker") along the shortest path between two other nodes.

In this conception, vertices that have a high probability to occur on a randomly chosen shortest path between two randomly chosen vertices have a high betweenness.

$$ C_B(v) =\sum_{s,t \in V} \frac{\sigma(s, t|v)}{\sigma(s, t)} $$

where ${\sigma(s, t)}$ is total number of shortest paths from node ${s}$ to node ${t}$ and ${\sigma(s, t|v)}$ is the number of those paths that pass through ${v}$.

Why is betweeness centrality important?

Suppose that I want to influence you by sending you information, or make a deal to exchange some resources. But, in order to talk to you, I must go through an intermediary. For example, let's suppose that I wanted to try to convince the Chancellor of my university to buy me a new computer. According to the rules of our bureaucratic hierarchy, I must forward my request through my department chair, a dean, and an executive vice chancellor. Each one of these people could delay the request, or even prevent my request from getting through. This gives the people who lie "between" me and the Chancellor power with respect to me... Having more than one channel makes me less dependent, and, in a sense, more powerful. [Source]

While this metric seems less relevent to our particular network, let's create a hypothetical situation:

To engage with a new romantic partner, you needed permission from at least one of their former partners and you could only send your request through existing partners (awkward 😬). Betweeness centrality would tell you which actors had the most effective veto power to thwart random requests made by a member to another random member in the network.


In [21]:
betweeness_centrality = nx.betweenness_centrality(GA)
betweeness_centrality


Out[21]:
{'addison': 0.09480286738351254,
 'adele': 0.0,
 'altman': 0.16344086021505377,
 'arizona': 0.0,
 'avery': 0.0,
 'bailey': 0.002150537634408602,
 'ben': 0.0,
 'chief': 0.0064516129032258064,
 'colin': 0.0,
 'denny': 0.0,
 'derek': 0.03860215053763442,
 'ellis grey': 0.008602150537634409,
 'finn': 0.0,
 'grey': 0.10078853046594982,
 'hank': 0.0,
 'izzie': 0.10311827956989247,
 'karev': 0.20487455197132612,
 'kepner': 0.0,
 'lexi': 0.07741935483870968,
 'mrs. seabury': 0.0,
 'nancy': 0.0,
 "o'malley": 0.11702508960573478,
 'olivia': 0.01064516129032258,
 'owen': 0.12903225806451613,
 'preston': 0.0,
 'sloan': 0.248100358422939,
 'steve': 0.0,
 'susan grey': 0.0,
 'thatch grey': 0.0064516129032258064,
 'torres': 0.14440860215053763,
 'tucker': 0.0,
 'yang': 0.09247311827956989}

In [22]:
# top 5
sorted(betweeness_centrality.items(), key=lambda x: x[1], reverse=True)[:5]


Out[22]:
[('sloan', 0.248100358422939),
 ('karev', 0.20487455197132612),
 ('altman', 0.16344086021505377),
 ('torres', 0.14440860215053763),
 ('owen', 0.12903225806451613)]

Eigenvector Centrality

A node is high in eigenvector centrality if it is connected to many other nodes who are themselves well connected. Eigenvector centrality for each node is simply calculated as the proportional eigenvector values of the eigenvector with the largest eigenvalue.

Middle Left ("C"): Eigenvector Centrality. Middle Right ("D"): Degree Centrality


In [23]:
eigenvector_centrality = nx.eigenvector_centrality_numpy(GA)
eigenvector_centrality


Out[23]:
{'addison': 0.27840139594529484,
 'adele': 5.058876862528936e-17,
 'altman': 0.10442628192357334,
 'arizona': 0.10564201543690817,
 'avery': 0.07734385472828524,
 'bailey': -2.195339193913248e-16,
 'ben': -1.5057529497486453e-16,
 'chief': 4.108700022015808e-17,
 'colin': 0.0035243897348137374,
 'denny': 0.0832030185430171,
 'derek': 0.12570740328311875,
 'ellis grey': 1.969945244481185e-18,
 'finn': 0.04422017135181157,
 'grey': 0.1510783608855737,
 'hank': 0.08320301854301704,
 'izzie': 0.28426338654827654,
 'karev': 0.5027687871890412,
 'kepner': 0.14715887695313748,
 'lexi': 0.2642455341015448,
 'mrs. seabury': 0.14715887695313762,
 'nancy': 0.09444834886225349,
 "o'malley": 0.3020119709505961,
 'olivia': 0.23555685153699446,
 'owen': 0.03408964112637771,
 'preston': 0.003524389734813675,
 'sloan': 0.32268309457542504,
 'steve': 0.044220171351811575,
 'susan grey': -2.2838591730981282e-17,
 'thatch grey': -2.687699295880897e-17,
 'torres': 0.36092629324926223,
 'tucker': -1.3686099029957458e-16,
 'yang': 0.01204108912245926}

In [24]:
max_value = max(eigenvector_centrality.items(), key=lambda x: x[1])

ec_scaled = {}
for k in eigenvector_centrality.keys():
    ec_scaled[k] = eigenvector_centrality[k] / max_value[1]

# Scaled by the most central character (karev)
sorted(ec_scaled.items(), key=lambda x:x[1], reverse=True)[0:5]


Out[24]:
[('karev', 1.0),
 ('torres', 0.7178772876239707),
 ('sloan', 0.6418121068722911),
 ("o'malley", 0.6006975346244785),
 ('izzie', 0.5653958515157256)]